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NNAN (NEAREST NEIGHBOR ANALYSIS) 

OREG (ORTHOGONALIZED REGRESSION ANALYSIS) 

OREH (ADDITIONAL ORTHOGONAL REGRESSION COEFFICIENTS) 


D ECUS Program Library Write-up DECUS No. 8-328 

NNAN (NEAREST NEIGHBOR ANALYSIS) 

SUMMARY 

This program computes, for each of a set of points, the nearest neighbor in n-dimensional 
space, and the distance to this neighbor. The program is intended for use with sets of data 
for which the number of dimensions has been reduced to a relatively small number of 
orthogonal axes, but the algorithm is applicable to other types of data. 

TAPES REQUIRED 

Form of program tape - The program is written in the PDP-8 FORTRAN-D language, and is 
in the source language. 

Form of data tape - The data to be analyzed should be punched onto paper tape in the ASCII 
code. They should consist of the successive co-ordinates of the individuals, in a standard 
er " . P° lnts ther "selves will be given serial numbers by the program, but these 
'“ a ' lon numbers should not^occur on the data tape. The output tapes from the CVAL 
and GRMN programs are suitable for input without modification. 

OPERATING INSTRUCTIONS 


• FORT 

*OUT-S:NNAN 

* 

*IN-R: 

* •'JN 

-r 

*READY 

T 


Source program tape in high-speed reader 


Data tape in high-speed reader 


"» Program will request the entry of the number of variables and data sets, and these 
s ould be entered on the teletype and terminated by "Return". If an output tape is required, 
switch on the low-speed punch before typing "Return". 

If the program has already been compiled onto the disk, it may be called back into core 
as follows: 











Data tape in high-speed reader 


. FOSL 

*IN-S:NNAN 

* 

*OPT' 

* t 

*READY 

t 


OUTPUT 

The program prints, for each point in turn, the identification number of the point being 
considered, the identification of the nearest neighbor, and the distance between these two 
points. The output from this program is directly suitable for the CLAN program which 
carries out a cluster analysis. 

STORAGE AND LIMITATIONS 

Normal for FORTRAN-D For the 4K version, the number of variables times the number of 
sets must not exceed 260. This restriction can be relaxed by using the 8K version. 

METHOD 

The program computes the distance between the reference point and every other point in the 
set, and stores the shortest distance. The distance is com puted from the formu la; 

Distance^Xj. - X^) 2 + ~ X^ 2 + . + ( X nF ' X nj ) 2 
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*R 


*L 


C; PROGRAM SEGMENT NNAN 
DIMENSION V(26J0) 

TYPE 100 

100 FORMAT (//'ENTER NOS OF VARIABLES AND SETS " / /) 

ACCEPT ljefl,Nl,N2 ' /,/] 

101 FORMAT (I, I) 

N3=N1*N2 
DO 20 1=1, N3 
READ 2, 103, V(l) 

20} CONTINUE 

103 FORMAT (E) 

NljeM 
N4=J0 
4; N5=J0 

N8=l 

V0=1000000000.0 
3; IF (N5-N4) 5,1,5 

5; N6=N4 

N7=N5 
N0=0 
V1=0 

2 V2=V(1+N6)-V(1+N7) 

V1=V1+(V2*V2) 

N0=N|0+1 

N6=N6+1 
N7=N7+1 
IF (N1-N0) 2,6,2 
6; V1=SQTF(V1) 

IF (V1-Y0) 7,1,1 
7; V0=V1 

N9=N8 

1 N5=N5f N1 

N8=N8+1 
IF (N2-N8) 8,3,3 
8; TYPE 102, N1J0, N9, V0 

102 FORMAT (/, I, |, E) 

N1j0=N1j&-1 
N4=N4+N1 
IF (N2-N1J0) 9,4,4 
9; STOP 

END 
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OREG (ORTHOGONALIZED REGRESSION CALCULATION) 


SUMMARY 

The program calculates the orthogonalized regression of one or more dependent variables 
on the principal components of up to 20 regressor variables. The basic data required are 
the correlation coefficients between the dependent variables and the regressor variables, 
and the latent roots and vectors of the correlation matrix of the regressor variables. 

TAPES REQUIRED 

Form of program tape - The program is written in the PDP-8 FORTRAN-D language, and is 
in the source language. 

Form of data tape - The data to be analyzed should be punched onto paper tape in the 
ASCII code. The data should begin with the latent roots for as many components as are 
required. The latent roots should then be followed by the latent vectors corresponding to 
each root. Finally, the coefficients of the correlations with the original variables should 
be given for each of the dependent variables, e.g. 

Latent roots 

3.2470 1.2753 0.3859 0.0700 0.0218 


Latent vectors (punched by rows) 


-0.5212 

-0.0711 

0.4730 

0.5219 

-0.4757 

0.3121 

0.7090 

0.2161 

0.5942 

-0.0112 

0.4753 

0.0381 

0.8246 

0.0103 

0.3050 

0.3245 

-0.6943 

-0.2224 

0.5801 

0.1629 

-0.5471 

0.0935 

0.0105 

0.1945 

0.8087 

Consumption of oil 




-0.457 

0.032 

0.899 

0.601 

0.710 


OPERATING INSTRUCTIONS 

If the program has not previously been compiled onto the disk: 
.FORT 

*OUT-S:OREG 
* 

*IN-R: 

* T 

*READY 

T 


Insert program tape in high-speed reader 
Insert data tape in high-speed reader 











The . f f° 9r ° m wil , 1 re9uesf th , e entr X of H» numbers of variobles, latent roots, and dependent 
variables (correlations). These should be entered on the teletype, and terminated by 
return . 1 


If the program has already been compiled onto the disk, it may be called into the 
store as follows: 


core 


. FOSL 

*IN-S:OREG 

* 

*OPT- 

*READY 

* 

OUTPUT 


Insert data tape in high-speed reader 


The program first prints the proportions of the variability of the dependent variable 
accounted for by each of the components. These are followed by the orthogonalized re 
gression coefficients for each of the components. 


STORAGE AND LIMITATIONS 


Normal for FORTRAN-D The maximum number of regressor variables must not exceed 20 
an e number of components times the number of regressor variables must not exceed 100. 

METHOD 

The calculations follow closely the method described by Kendall in Chapter 5 of "A Course 
In Multivariate Analysis". The latent vectors are first transformed by dividing them by the 
square root of their sum of squares. The resulting s. are multiplied by the corresponding 

correlation coefficients and the sum of the products divided by the latent root. The reduction 
of the variance due to the fitting of the components is calculated, and the coefficients 
of the orthogonalized regressions expressed in terms of the standardized variates. 
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C PROGRAM TO COMPUTE ORTHOGONALIZED REGRESSIONS, OREG 

DIMENSION V(100), CORR(20), PROP(20), REGR(20) 

TYPE 104 . 

104 FORMAT (/, "ENTER NO OF VARIABLES, ROOTS, CORRELATIONS",/) 

ACCEPT 101, N,M,K 

101 FORMAT (I, I, I) 

DO 10 1=1 ,M 
READ 2,102, PROP(I) 

102 FORMAT (E) 

10 CONTINUE 

NM=N*M 
DO 20 1=1, NM 

READ 2,102,V(l) ^ 

20 CONTINUE 

DO 40 1=1, M 
PS 1=0.0 
DO 50 J=1, N 
L=J+N*(I-1) 

PSI=PSI+V(L)*V(L) 

50 CONTINUE 

PSI=SQTF(PSI) 

DO 40 J=1, N 
L=J+N*(I-1) 

V(L)=V(L)/PSI 
40 CONTINUE 

DO 30 IN=1,K 
DO 60 1-1, N 
READ 2,102, CORR(I) 

R EG R( l)=0.0 

60 CONTINUE J 

DO 70 1=1,M 
DO 70 J=1, N 
L=>N*(I-1) 

REGR(l)=REGR(l)+CORR(J)*V(L) 

70 CONTINUE 

DO 80 1=1, M 
REGR(l)=REGR(l)/PROP(l) 

C0RR(I)=REGR(I)*REGR(I)*PR0P(I) 

TYPE 103, CORR(I) 

103 FORMAT (/,E) 

80 CONTINUE 

J=0 

DO 90 1-1,M 
TYPE 105 
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105 FORMAT (/) 

DO 100 1=1, N 
LL=J+L 

PSI=REGR(I)*V(LL) 
TYPE lj03,PSI 
100 CONTINUE 
J=>N 

90 CONTINUE 
30 CONTINUE 
STOP 
END 


* 
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OREH (PROGRAM TO ADD ORTHOGONALIZED REGRESSION COEFFICIENTS) 


SUMMARY 

This program is an auxiliary program to OREG, the orthogonalized regression program. It 
adds the corresponding orthogonal ized regression coefficients for nom mated components to 
give a single vector of standardized regression coefficients. Being standardized, these re 
gression coefficients may be used to judge the relative importance of the original variables 
in estimating values of the dependent variable. 


TAPES REQUIRED 

Fo^rn of program tape - The program is written in the PDP-8 FORTRAN-D language, and is 
in the source language. 

Form of data tape - The data tape for this program is the output tape from the OREG program. 
If an output ta pe~was not obtained from the original run of this program, a special version 
of OREG enables these results to be punched on the high-speed punch. 


METHOD OF OPERATION 

Normal for the disk operating system. The data tape should be loaded into the high-speed 
reader before continuing after the teletype has printed READY. The program will request 
the entry of the number of variables and components and will pause after reading each set 
of orthogonalized regression coefficients. If the coefficients of that set are required to be 
added, type 1 (space). Otherwise, type 0 (space). 


OUTPUT 

The output from this program consists of the algebraic sum of the corresponding orthogonalized 
regression coefficients which have been nominated for inclusion by the operator. 

STORAGE AND LIMITATIONS 

Normal for FORTRAN-D. As for OREG, the number of initial variables is limited to 20. 
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OREH 


100 

101 

10 

30 


1 

40 

20 


50 
102 


5ST " COEFF,CIENTS 

Accm'w, n NTER N ° of variabl es",/) 

FORMAT (I)' 

DO 10 1=] ,N 

READ 2, 102,X(l) 

SX(l)=0.0 
CONTINUE 
DO 20 1=1, n 
DO 30 J=l,l\/ 

READ 2, lj02,X(J) 

CONTINUE 
ACCEPT 101 ,K 
IF (K) 1,20,1 
DO 40 J=l, m 
SX(J)=SX(J)+X(J) 

CONTINUE 
CONTINUE 
DO 5,0 1=1 # n 
TYPE 102,SX(I) 

CONTINUE 
FORMAT (/, E) 

STOP 

END 
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